THE CLAIMS 

What is claimed is: 

1 . A method comprising: 

responsive to a first single instruction, identifying a first operand including four 
packed data elements, (r3, r2, n and ro), and identifying a second operand including four 
packed coefficients, (w3, W2, wi and wo), generating four packed first products, O3W3, 
r2 w 2> nwi and rowo) and storing said four packed first products at a first destination 
identified by said first single instruction; 

responsive to a second single instruction, identifying a third operand including four 
packed data elements, (S3, S2, si and so), and identifying a fourth operand including four 
packed coefficients, (W7, w6, W5 and W4), generating four packed second products, 
(S3W7, S2W6, siws and SOW4) and storing said four packed second products at a second 
destination identified by said second single instruction; and 

responsive to a third single instruction, identifying a fifth operand including the four 
packed first products and identifying a sixth operand including the four packed second 
products, generating four packed sums, (S2W6+S3W7, sow4+siw5 ) r2 w2+r3w3, and 
rowo+riwi) and storing them at a third destination identified by said third single 
instruction. 

2. The method of Claim 1 further comprising: 

responsive to the first single instruction, overwriting said four packed data elements 
in the first operand with said four packed first products; 
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responsive to the second single instruction, overwriting said four packed data 
elements in the second operand with said four packed second products; and 

responsive to the third single instruction, overwriting said four packed first products 
in the fifth operand with said four packed sums. 

3. The method of Claim 1 further comprising: 

responsive to a fourth single instruction identifying a seventh operand including the 
four packed first products and identifying an eighth operand including the four packed 
second products, generating four packed differences, (S2W6-S3W7 ? s()w4-siw5 9 r2w2- 
r3w3, and rowo-riwi) and storing them at a fourth destination identified by said fourth 
single instruction. 

4. The method of Claim 3, said fourth destination storing elements to represent saturated 
packed differences, (s2w6-s3W7 9 sow4-siw5 5 r2W2-r3w3, and rowo-nwi). 

5. The method of Claim 1 , said third destination storing elements to represent horizontal 
addition operations in a register specified by bits three through five of the third single 
instruction. 

6. The method of Claim 5, said third destination storing elements to represent saturated 
arithmetic sums, (S2W6+S3W7 9 sow4+s]w5 9 r2 w2+r3W3, and rQwo+riwi). 
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7. The method of Claim 5, said third destination comprising packed 16-bit elements to 
represent said four packed sums, (S2w6+S3w7 ? sow4+siw5 ? r2 w2+r3w3, and 
rowo+riwi). 

8. The method of Claim 5, said third destination comprising packed 32-bit elements to 
represent said four packed sums, (S2W6+S3W7 9 sow4+siw5 9 r2 w2+r3w3, and 
r()W0+nwi). 

9. The method of Claim 8, said third destination storing elements to represent horizontal 
floating-point arithmetic operations. 

10. A processor comprising: 

a storage area to store a first packed data operand, a second packed data operand and 
a third packed data operand; and 

an execution unit coupled to said storage area, the execution unit to execute a first 
single instruction on data elements in said first packed data operand and said second 
packed data operand to generate a plurality of data elements in a first packed data result, 
at least one of said plurality of data elements in said first packed data result being the 
result of an intra-add operation performed on a first pair of data elements of said first 
packed data operand and at least one other of said plurality of data elements in said first 
packed data result being the result of an intra-add operation performed on a second pair 
of data elements of said second packed data operand, the execution unit to execute a 
second single instruction on data elements in said third packed data operand and said 
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second packed data operand to generate a plurality of data elements in a second packed 
data result, at least one of said plurality of data elements in said second packed data 
result being the result of an intra-subtract operation performed on a third pair of data 
elements of said third packed data operand and at least one other of said plurality of data 
elements in said second packed data result being the result of an intra-subtract operation 
performed on the second pair of data elements of said second packed data operand. 

11. The processor of claim 10, wherein 

each of said plurality of data elements in said first packed data result being the result 
of an intra-add operation with signed saturation. 

12. The processor of claim 11, wherein 

each of said plurality of data elements in said second packed data result being the 
result of an intra-subtract operation with signed saturation. 

13. The processor of claim 10, the execution unit, in response to said first single instruction, 
overwriting said first packed data operand with said first packed data result. 

14. A apparatus comprising: 

a first storage area for storing a first packed data operand, containing at least an A 

data element and a B data element packed together; 

a second storage area for storing a second packed data operand containing at least a C 

data element and a D data element packed together; and 
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an arithmetic circuit responsive to execution of a first single instruction to add the A 
data element and the B data element to generate a first result element of a third packed 
data, and to add the C data element and the D data element to generate a second result 
element of the third packed data, the arithmetic circuit responsive to execution of a 
second single instruction to subtract the A data element and the B data element to 
generate a third result element of a fourth packed data, and to subtract the C data element 
and the D data element to generate a fourth result element of the fourth packed data. 



15. The apparatus of claim 14 wherein 

each of said plurality of data elements in said third packed data and said fourth 
packed data are the result of an intra-add operation or an intra-subtract operation with 
signed saturation. 



16. The apparatus of claim 14 further comprising: 

a decoder to decode said first single instruction and said second single instruction 
and to enable execution of said first single instruction and said second single instruction; 
and 

a register file comprising said first storage area and said second storage area, to 
provide the A data element, the B data element, the C data element and the D data 
element responsive to the execution of said first or second single instruction. 



17. A system comprising: 

a first storage area for storing a first packed data operand, containing at least an A 
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data element and a B data element packed together; 

a second storage area for storing a second packed data operand containing at least a C 
data element and a D data element packed together; 

a third storage area for storing a third packed data operand containing at least a E 
data element and a F data element packed together; 

a decoder to decode a first single instruction and to enable execution of said first 
single instruction, the decoder to decode a second single instruction and to enable 
execution of said second single instruction; 

an arithmetic circuit responsive to enabling execution of said first single instruction 
to add the A data element and the B data element to generate a first result element of a 
fourth packed data, and to add the C data element and the D data element to generate a 
second result element of the fourth packed data, the arithmetic circuit responsive to 
enabling execution of said second single instruction to subtract the E data element and 
the F data element to generate a third result element of a fifth packed data, and to 
subtract the C data element and the D data element to generate a fourth result element of 
the fifth packed data; 

a wireless communication device to send and receive digital data over a wireless 
network; 

a memory to store digital data and software including the first and second single 
instructions and to supply the first and second single instructions to said decoder; and 

an input output system responsive to said software to interface with the wireless 
communication device receiving data to process or sending data processed at least in part 
by said first and second single instructions. 
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18. The system of claim 17, wherein 

each of said first and second result elements of the fourth packed data are the result 
of an intra-add operation with signed saturation. 



19. The system of claim 18, wherein 

each of said third and fourth result elements of the fifth packed data are the result of 
an intra-subtract operation with signed saturation. 



20. The system of claim 17, wherein 

each of said first and second result elements of the fourth packed data and said third 
and fourth result elements of the fifth packed are 16-bit results. 



21. The system of claim 17, wherein 

each of said first and second result elements of the fourth packed data and said third 
and fourth result elements of the fifth packed are 32-bit results. 



22. The system of claim 21, wherein 

each of said first and second result elements of the fourth packed data and said third 
and fourth result elements of the fifth packed data are floating point results. 
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23. The system of claim 21, wherein 

each of said first and second result elements of the fourth packed data and said third 
and fourth result elements of the fifth packed data are unsaturated signed integer results. 

24. A computer software product including one or more recordable media having executable 
instructions stored thereon including a first instruction and a second instruction which, 
when executed by a processing device, cause the processing device to: 

access a first packed data operand, containing at least an A data element and a B data 
element packed together at a first storage area; 

access a second packed data operand containing at least a C data element and a D 
data element packed together at a second storage area; 

add the A data element and the B data element to generate a first result element of a 
third packed data in response to said first instruction; 

add the C data element and the D data element to generate a second result element of 
the third packed data in response to said first instruction; 

access a fourth storage area for storing a fourth packed data operand containing at 
least a E data element and a F data element packed together; 

access the second packed data operand containing at least the C data element and the 
D data element packed together at the second storage area; 

subtract the E data element and the F data element to generate a third result element 
of a fifth packed data in response to said second instruction; and 

subtract the C data element and the D data element to generate a fourth result 
element of the fifth packed data in response to said first instruction. 

42P 15770 -50- 



25. The computer software product of Claim 24 which, when executed by a processing 
device, further cause the processing device to: 

overwrite said first packed data operand with said third packed data in response to 
said first instruction. 



26. The computer software product of Claim 24 which, when executed by a processing 
device, further cause the processing device to: 

overwrite said fourth packed data operand with said fifth packed data in response to 
said second instruction. 



27. The system of claim 24, wherein 

each of said first and second result elements of the third packed data and said third 
and fourth result elements of the fifth packed data are saturated signed values. 



28. The system of claim 24, wherein 

each of said first and second result elements of the third packed data and said third 
and fourth result elements of the fifth packed data are 16-bit integer values. 



29. The system of claim 24, wherein 

each of said first and second result elements of the third packed data and said third 
and fourth result elements of the fifth packed data are 32-bit integer values. 
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